Just-in-time software defect prediction (JIT-SDP) is an active topic in software defect prediction, which aims to identify defect inducing\nchanges. Recently, some studies have found that the variability of defect data sets can affect the performance of defect\npredictors. By using local models, it can help improve the performance of prediction models. However, previous studies have\nfocused on module-level defect prediction. Whether local models are still valid in the context of JIT-SDP is an important issue. To\nthis end, we compare the performance of local and global models through a large-scale empirical study based on six open-source\nprojects with 227417 changes.The experiment considers three evaluation scenarios of cross-validation, cross-project-validation,\nand time wise-cross-validation. To build local models, the experiment uses the k-medoids to divide the training set into several\nhomogeneous regions. In addition, logistic regression and effort-aware linear regression (EALR) are used to build classification\nmodels and effort-aware prediction models, respectively. The empirical results show that local models perform worse than global\nmodels in the classification performance. However, local models have significantly better effort-aware prediction performance\nthan global models in the cross-validation and cross-project-validation scenarios. Particularly, when the number of clusters k is set\nto 2, local models can obtain optimal effort-aware prediction performance. Therefore, local models are promising for effort aware\nJIT-SDP.
Loading....